Topic:Temporal Convolutional Networks
What is Temporal Convolutional Networks? Temporal convolutional networks (TCNs) are deep learning models that use 1D convolutions for sequence modeling tasks.
Papers and Code
Jul 09, 2025
Abstract:Cell-free integrated sensing and communication (ISAC) systems have emerged as a promising paradigm for sixth-generation (6G) networks, enabling simultaneous high-rate data transmission and high-precision radar sensing through cooperative distributed access points (APs). Fully exploiting these capabilities requires a unified design that bridges system-level optimization with multi-target parameter estimation. This paper proposes an end-to-end graph learning approach to close this gap, modeling the entire cell-free ISAC network as a heterogeneous graph to jointly design the AP mode selection, user association, precoding, and echo signal processing for multi-target position and velocity estimation. In particular, we propose two novel heterogeneous graph learning frameworks: a dynamic graph learning framework and a lightweight mirror-based graph attention network (mirror-GAT) framework. The dynamic graph learning framework employs structural and temporal attention mechanisms integrated with a three-dimensional convolutional neural network (3D-CNN), enabling superior performance and robustness in cell-free ISAC environments. Conversely, the mirror-GAT framework significantly reduces computational complexity and signaling overhead through a bi-level iterative structure with share adjacency. Simulation results validate that both proposed graph-learning-based frameworks achieve significant improvements in multi-target position and velocity estimation accuracy compared to conventional heuristic and optimization-based designs. Particularly, the mirror-GAT framework demonstrates substantial reductions in computational time and signaling overhead, underscoring its suitability for practical deployments.
* Under review
Via

Jul 08, 2025
Abstract:Machine learning methods often struggle with real-world applications in science and engineering due to limited or low-quality training data. In this work, the example of groundwater flow with heat transport is considered; this corresponds to an advection-diffusion process under heterogeneous flow conditions, that is, spatially distributed material parameters and heat sources. Classical numerical simulations are costly and challenging due to high spatio-temporal resolution requirements and large domains. While often computationally more efficient, purely data-driven surrogate models face difficulties, particularly in predicting the advection process, which is highly sensitive to input variations and involves long-range spatial interactions. Therefore, in this work, a Local-Global Convolutional Neural Network (LGCNN) approach is introduced. It combines a lightweight numerical surrogate for the transport process (global) with convolutional neural networks for the groundwater velocity and heat diffusion processes (local). With the LGCNN, a city-wide subsurface temperature field is modeled, involving a heterogeneous groundwater flow field and one hundred groundwater heat pump injection points forming interacting heat plumes over long distances. The model is first systematically analyzed based on random subsurface input fields. Then, the model is trained on a handful of cut-outs from a real-world subsurface map of the Munich region in Germany, and it scales to larger cut-outs without retraining. All datasets, our code, and trained models are published for reproducibility.
Via

Jul 02, 2025
Abstract:Estimation of model uncertainty can help improve the explainability of Graph Convolutional Networks and the accuracy of the models at the same time. Uncertainty can also be used in critical applications to verify the results of the model by an expert or additional models. In this paper, we propose Variational Neural Network versions of spatial and spatio-temporal Graph Convolutional Networks. We estimate uncertainty in both outputs and layer-wise attentions of the models, which has the potential for improving model explainability. We showcase the benefits of these models in the social trading analysis and the skeleton-based human action recognition tasks on the Finnish board membership, NTU-60, NTU-120 and Kinetics datasets, where we show improvement in model accuracy in addition to estimated model uncertainties.
* This work has been submitted to the IEEE for possible publication. 9
pages, 6 figures
Via

Jul 02, 2025
Abstract:Edge devices for temporal processing demand models that capture both short- and long- range dynamics under tight memory constraints. While Transformers excel at sequence modeling, their quadratic memory scaling with sequence length makes them impractical for such settings. Recurrent Neural Networks (RNNs) offer constant memory but train sequentially, and Temporal Convolutional Networks (TCNs), though efficient, scale memory with kernel size. To address this, we propose mGRADE (mininally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit (minGRU). This design allows the convolutional layer to realize a flexible delay embedding that captures rapid temporal variations, while the recurrent module efficiently maintains global context with minimal memory overhead. We validate our approach on two synthetic tasks, demonstrating that mGRADE effectively separates and preserves multi-scale temporal features. Furthermore, on challenging pixel-by-pixel image classification benchmarks, mGRADE consistently outperforms both pure convolutional and pure recurrent counterparts using approximately 20% less memory footprint, highlighting its suitability for memory-constrained temporal processing at the edge. This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.
Via

Jul 03, 2025
Abstract:The primary objective of human activity recognition (HAR) is to infer ongoing human actions from sensor data, a task that finds broad applications in health monitoring, safety protection, and sports analysis. Despite proliferating research, HAR still faces key challenges, including the scarcity of labeled samples for rare activities, insufficient extraction of high-level features, and suboptimal model performance on lightweight devices. To address these issues, this paper proposes a comprehensive optimization approach centered on multi-attention interaction mechanisms. First, an unsupervised, statistics-guided diffusion model is employed to perform data augmentation, thereby alleviating the problems of labeled data scarcity and severe class imbalance. Second, a multi-branch spatio-temporal interaction network is designed, which captures multi-scale features of sequential data through parallel residual branches with 3*3, 5*5, and 7*7 convolutional kernels. Simultaneously, temporal attention mechanisms are incorporated to identify critical time points, while spatial attention enhances inter-sensor interactions. A cross-branch feature fusion unit is further introduced to improve the overall feature representation capability. Finally, an adaptive multi-loss function fusion strategy is integrated, allowing for dynamic adjustment of loss weights and overall model optimization. Experimental results on three public datasets, WISDM, PAMAP2, and OPPORTUNITY, demonstrate that the proposed unsupervised data augmentation spatio-temporal attention diffusion network (USAD) achieves accuracies of 98.84%, 93.81%, and 80.92% respectively, significantly outperforming existing approaches. Furthermore, practical deployment on embedded devices verifies the efficiency and feasibility of the proposed method.
Via

Jul 02, 2025
Abstract:We investigate a novel approach to time-series modeling, inspired by the successes of large pretrained foundation models. We introduce FAE (Foundation Auto-Encoders), a foundation generative-AI model for anomaly detection in time-series data, based on Variational Auto-Encoders (VAEs). By foundation, we mean a model pretrained on massive amounts of time-series data which can learn complex temporal patterns useful for accurate modeling, forecasting, and detection of anomalies on previously unseen datasets. FAE leverages VAEs and Dilated Convolutional Neural Networks (DCNNs) to build a generic model for univariate time-series modeling, which could eventually perform properly in out-of-the-box, zero-shot anomaly detection applications. We introduce the main concepts of FAE, and present preliminary results in different multi-dimensional time-series datasets from various domains, including a real dataset from an operational mobile ISP, and the well known KDD 2021 Anomaly Detection dataset.
* Presented at ACM KDD 2024, MiLeTS 2024 Workshop, August 25, 2024,
Barcelona, Spain
Via

Jul 02, 2025
Abstract:Grey matter loss in the hippocampus is a hallmark of neurobiological aging, yet understanding the corresponding changes in its functional connectivity remains limited. Seed-based functional connectivity (FC) analysis enables voxel-wise mapping of the hippocampus's synchronous activity with cortical regions, offering a window into functional reorganization during aging. In this study, we develop an interpretable deep learning framework to predict brain age from hippocampal FC using a three-dimensional convolutional neural network (3D CNN) combined with LayerCAM saliency mapping. This approach maps key hippocampal-cortical connections, particularly with the precuneus, cuneus, posterior cingulate cortex, parahippocampal cortex, left superior parietal lobule, and right superior temporal sulcus, that are highly sensitive to age. Critically, disaggregating anterior and posterior hippocampal FC reveals distinct mapping aligned with their known functional specializations. These findings provide new insights into the functional mechanisms of hippocampal aging and demonstrate the power of explainable deep learning to uncover biologically meaningful patterns in neuroimaging data.
Via

Jun 26, 2025
Abstract:Brain-computer interface (BCI) technology utilizing electroencephalography (EEG) marks a transformative innovation, empowering motor-impaired individuals to engage with their environment on equal footing. Despite its promising potential, developing subject-invariant and session-invariant BCI systems remains a significant challenge due to the inherent complexity and variability of neural activity across individuals and over time, compounded by EEG hardware constraints. While prior studies have sought to develop robust BCI systems, existing approaches remain ineffective in capturing the intricate spatiotemporal dependencies within multichannel EEG signals. This study addresses this gap by introducing the attentive graph-temporal convolutional network (AGTCNet), a novel graph-temporal model for motor imagery EEG (MI-EEG) classification. Specifically, AGTCNet leverages the topographic configuration of EEG electrodes as an inductive bias and integrates graph convolutional attention network (GCAT) to jointly learn expressive spatiotemporal EEG representations. The proposed model significantly outperformed existing MI-EEG classifiers, achieving state-of-the-art performance while utilizing a compact architecture, underscoring its effectiveness and practicality for BCI deployment. With a 49.87% reduction in model size, 64.65% faster inference time, and shorter input EEG signal, AGTCNet achieved a moving average accuracy of 66.82% for subject-independent classification on the BCI Competition IV Dataset 2a, which further improved to 82.88% when fine-tuned for subject-specific classification. On the EEG Motor Movement/Imagery Dataset, AGTCNet achieved moving average accuracies of 64.14% and 85.22% for 4-class and 2-class subject-independent classifications, respectively, with further improvements to 72.13% and 90.54% for subject-specific classifications.
* This work has been submitted to the IEEE for possible publication
Via

Jun 17, 2025
Abstract:Summary: Errors in gradient trajectories introduce significant artifacts and distortions in magnetic resonance images, particularly in non-Cartesian imaging sequences, where imperfect gradient waveforms can greatly reduce image quality. Purpose: Our objective is to develop a general, nonlinear gradient system model that can accurately predict gradient distortions using convolutional networks. Methods: A set of training gradient waveforms were measured on a small animal imaging system, and used to train a temporal convolutional network to predict the gradient waveforms produced by the imaging system. Results: The trained network was able to accurately predict nonlinear distortions produced by the gradient system. Network prediction of gradient waveforms was incorporated into the image reconstruction pipeline and provided improvements in image quality and diffusion parameter mapping compared to both the nominal gradient waveform and the gradient impulse response function. Conclusion: Temporal convolutional networks can more accurately model gradient system behavior than existing linear methods and may be used to retrospectively correct gradient errors.
Via

Jun 25, 2025
Abstract:Recently, dense video captioning has made attractive progress in detecting and captioning all events in a long untrimmed video. Despite promising results were achieved, most existing methods do not sufficiently explore the scene evolution within an event temporal proposal for captioning, and therefore perform less satisfactorily when the scenes and objects change over a relatively long proposal. To address this problem, we propose a graph-based partition-and-summarization (GPaS) framework for dense video captioning within two stages. For the ``partition" stage, a whole event proposal is split into short video segments for captioning at a finer level. For the ``summarization" stage, the generated sentences carrying rich description information for each segment are summarized into one sentence to describe the whole event. We particularly focus on the ``summarization" stage, and propose a framework that effectively exploits the relationship between semantic words for summarization. We achieve this goal by treating semantic words as nodes in a graph and learning their interactions by coupling Graph Convolutional Network (GCN) and Long Short Term Memory (LSTM), with the aid of visual cues. Two schemes of GCN-LSTM Interaction (GLI) modules are proposed for seamless integration of GCN and LSTM. The effectiveness of our approach is demonstrated via an extensive comparison with the state-of-the-arts methods on the two benchmarks ActivityNet Captions dataset and YouCook II dataset.
* 12 pages
Via
